Comparative Study of Attribute Selection Using Gain Ratio and Correlation Based Feature Selection
نویسندگان
چکیده
Feature subset selection is of great importance in the field of data mining. The high dimension data makes testing and training of general classification methods difficult. In the present paper two filters approaches namely Gain ratio and Correlation based feature selection have been used to illustrate the significance of feature subset selection for classifying Pima Indian diabetic database (PIDD). The C4.5 tree uses gain ratio to determine the splits and to select the most important features. Genetic algorithm is used as search method with Correlation based feature selection as subset evaluating mechanism. The feature subset obtained is then tested using two supervised classification method namely, Back propagation neural network and Radial basis function network. Experimental results show that the feature subsets selected by CFS filter resulted in marginal improvement for both back propagation neural network and Radial basis function network classification accuracy when compared to feature subset selected by information gain filter.
منابع مشابه
Fuzzy-rough Information Gain Ratio Approach to Filter-wrapper Feature Selection
Feature selection for various applications has been carried out for many years in many different research areas. However, there is a trade-off between finding feature subsets with minimum length and increasing the classification accuracy. In this paper, a filter-wrapper feature selection approach based on fuzzy-rough gain ratio is proposed to tackle this problem. As a search strategy, a modifie...
متن کاملA New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
متن کاملDiagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...
متن کاملMachine Learning based Approach for protein Function Prediction using Sequence Derived Properties
Protein function prediction is an important and challenging field in Bioinformatics. There are various machine learning based approaches have been proposed to predict the protein functions using sequence derived properties. In this paper 857 sequence-derived features such as amino acid composition, dipeptide composition, correlation, composition, transition and distribution and pseudo amino aci...
متن کاملApplying Feature-Selection Algorithm to Predict Landslide in the Southwest of Iran
Extended abstract 1- INTRODUCTION Nowadays people have an increased sensitivity towards landslides especially in mountainous areas using change in the land use and the expansion of communication networks (Gvrsysky et al., 2006). In the twentieth century, Asia has allocated the highest incident of landslides (220 landslides). Latin America has had the highest number of casualties (more than 2,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010